{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "52659821",
   "metadata": {},
   "source": [
    "#### 多因子03因子预处理-20231024-总结\n",
    "\n",
    "20231024-总结<br>\n",
    "https://www.bilibili.com/video/BV14P41187ej"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "642ffbd0",
   "metadata": {},
   "source": [
    "## 因子预处理\n",
    "https://www.wolai.com/stupidccl/3QzCwVcyRScvSt9nUugzQG"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1debb750",
   "metadata": {},
   "source": [
    "## 去极值异常值——数据清洗"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8b339c99",
   "metadata": {},
   "source": [
    "#### 3σ\n",
    "\n",
    "我们将**3倍标准差**设为市值因子的阈值，那么就可以使用如下代码对因子值进行压缩"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8ed2272a",
   "metadata": {},
   "outputs": [],
   "source": [
    "def filter_extreme_by_sigma(series, n=3):\n",
    "    # 计算均值\n",
    "    mean = series.mean()\n",
    "    # 计算方差\n",
    "    std = series.std()\n",
    "    # 计算上下限的值\n",
    "    max_value = mean + n * std\n",
    "    min_value = mean - n * std\n",
    "    return np.clip(series, min_value, max_value)"
   ]
  },
  {
   "attachments": {
    "image.png": {
     "image/png": "iVBORw0KGgoAAAANSUhEUgAAAcwAAABTCAYAAAASnDekAAAaZUlEQVR4Ae2d16t0NRfG/Wu898IL8cYbEUFEEQuiKIq9994rYu+9d0Wxd8SCvXfEhr333vbHb388h2XeZO/MzJ5z5sz7BObdcybJSvJkZT3JSrLfVRoHI2AEjIARMAJGoBeBVXpTOIERMAJGwAgYASPQmDCtBEbACBgBI2AEKhAwYVaA5CRGwAgYASNgBEyY1gEjYASMgBEwAhUImDArQHISI2AEjIARMAImTOuAETACRsAIGIEKBEyYFSA5iREwAkbACBgBE6Z1wAgYASNgBIxABQImzAqQnMQIGAEjYASMgAnTOmAEjIARMAJGoAIBE2YFSE5iBIyAETACRsCEaR0wAkbACBgBI1CBgAmzAiQnMQJGwAgYASNgwrQOGAEjYASMgBGoQMCEWQGSkxgBI2AEjIARMGFaB4yAETACRsAIVCBgwqwAyUmMgBEwAkbACJgwrQNGwAgYASNgBCoQMGFWgOQkRsAIGAEjYARMmNYBI2AEjIARMAIVCJgwK0ByEiNgBIyAETACJkzrgBEwAkbACBiBCgRMmBUgOYkRMAJGwAgYAROmdcAIGAEjYASMQAUCJswKkJzECBgBI2AEjIAJ0zpgBIyAETACRqACARNmBUhOYgRmAYFPPvmkOeWUU5pvvvlmheo88MADzQsvvLDC7/7BCMwyAugsupuGH3/8sdX1d999N41a0r9XOsJ8+OGHm3322af59NNPlxR4Fz4dBP7666/m77//no7wJZT6wQcfNDvttFPz9NNPZ2txzjnnNLfffns2rubHf/75p3nvvfda+YyNf//9tyab00wBgd9//32lwR+dRXdzAV3fYostmrfffjsXvSS/DUqY77zzTrPzzjs322yzzX8+hxxySPPDDz9UNxCDd/nll/9HhmRCeOOGzz//vNlyyy2b9ddfvzUONXJ+/fXX5sQTT8zWRXWKz+23375VgNdee63BCDlMF4EHH3yw2XzzzZvVVlutWXXVVYt9+9tvvzWvvPJKgw4st/DVV1+1ZHnDDTcUDekkhPn99983Rx55ZKvjxx57bLPmmmu25LtcSPPDDz9s9thjjxXGKLYIm1QbsDtXXXXVCnIY31deeWWtmJHSQY6nn356s8EGG7T6iw4ffvjhDfqahp9++ql58cUXm++++y6NWrZ/dxEm+kf8JptsMlI/ThOMQQnziy++aC677LLmwgsvbLbbbrsFBRiFoGjsk08+2Q5alAdDCOEi86abbmq+/fbbsfAA/CuuuKKtEzJRvJrwxx9/NPfdd19b/tFHH73QJgYjdUo/++23X7P66qu36fbee2+vZGtAniAN/Xjeeee1Rq5EmPQhkx7i11prreaNN96YoMTFzYoRP/XUU5vdd9+9c9I5LmFK/tprr92OieOOO67Fabfddmtwiy2HwISCCTZjEVuhyRP9femllxYnGWnbcHlvu+22C7YHu3X22We3cieZqKflxL///PPP5o477mj7eJ111mmxzxEmCw50gDax6qKu8xC6CJP2sWA59NBD2wkd35c6DEqYagxKcPzxxzcHH3xws95667UKXEtQ7M8cccQRzUEHHbSgHBDxpOGtt95q1l133VYmSvf888+PLPKuu+5q8zMgu/J//PHHDWRJObjRGNAO00XgmWeeafHOTc6YkbNKoD/40I/LJdAuSP6xxx7rrPK4hIm7i3GBZ4T9JPADI1aaTDSWW8CtvPXWW7fERzv4/uWXX1Y1g4kxEyvykDdHXFWCxkjEhP7MM88slku71DejTPjHqMqiZukjTCrz8ssvL3g9FrVymcKmQpisAhmAt9xyS7Ppppu2SvDQQw9liv/vTygNro8bb7yxnSmitLiKcFtMErTCYBaNTD6jGs2o0LSpb4YHSYIBZeFyYSbvMD0EmMCAdY4w6Tu8ExgaZunLZQLDjPrAAw9sdtlll87VJaiOS5jgAm5McFlRXnLJJc0JJ5ywbD0jEMuuu+7a3Hvvva2Rpc+feOKJXsUD66OOOqrdw9XkajEJU31IX+TKxYax2iWeycwsrLZ6Qa1IUEOYGgejTH4qih4ryVQIk9XcDjvs0G7WijRuvfXW3gqS76STTmr91SLa6667rjdfXwJm5zvuuGNzzz33tAqH0tFRowSMCW6qkkLnZFF30m+88cbNRx99lEvi3wZCoIswBypi0cVoZl0zBsYhTHmC0NGaMhYdgDEKhDD32muvBi8P2yO0rWa1zEobwmQiPIuEOQYUyyJLDWHSEFb/9CXPpQxTIUxWb8yScIXxpKEM6K7ALAJ3CJv0cq9x+IDDM5MEXLwQHYdDIGStMs8///yRxEaXCPu0NUFGnPbXrLBrZDpNHgFhnVth5nPM9q94JPBMoK/obV8YhzCjq/rxxx/vK2JZxIswaZtWz30TVjwQ7HVijCMmuZXeNEGgD7EVi13uNNvUJ7uWMGV/lxqbwQmTgc5dMWas0Y3Z19A777yzVXDyQEgoDpvbk+xfIouThSpboNcQeNrREB75+vYvYz4ZcfLNyww+tm+WvgvreSFMtjU4gMKn5qDbOITJHTfOGPCZtftu4+pWJEy8OpAl4w/yLAUO1HCwhPQmzBJK0/m9ljB//vnn1mNQsx02nZr+X+rghIny4f7EgBEgChQW12zpagnuE9whxLPS5KQbedhXwW00buDuGqT76quvtiJYbY6zoQ/xavY3ConLiNMW8s96ePPNNxdO/bJ3zFF9juyzf8IKhP003FU8uSOlazPE33///a0rjHhcYBwgUXyu3crDaWMmIRxs4ZDYSy+9VDzVSD/ggWDlRTno1EUXXdS60YR1SpgYQK4KcIISdz/5cl4L9I7By1aCTjlz1J9+y+1XMzFknwy5nNLFDUi92G+nzOuvv769YhXrmMOh9Ju8IbVjoJYw2b9lT++aa65pdOqbVSx1h1To11x7S/Wctd8jYaJj6CLjD/csRjcXwOOMM85ozxmMQpjoDJhttdVWbRmccmX/lzqUQjqW0BsWC8iSjdEEXzLoj4svvrjVtTguFa+ndJ36MKZoN/YK/ScuDVxdufnmm1u5lM1Y1PUZymRsob/8jj7nZKQyR/27ljAZ+zoUhQdyqcLghMlAx+hoZaiTpSWiwfCwmS2CpaOG2L9ELsfxcb3ynTDKYIgdMs7+JflRBpSWz3JYYeK25s6T6gz5PProo62xYdXPPg/GBdIhDdd0MMAYozSeActARdHTwMSFPBgY3GAYMu5HMstXPvWZ8mJQMOrE77///q2+oGOs/BnUHIyhTilhQvgcAhEJkka6JtkYDpEHdaY+/MYeIm2FzKlnbIvuz+kqAHKpPyTKFgATiPfff7/1cJCflc4odwLl0ajdOqglTPqYNvFR3akfE0l+A8saF7Cwm7VnJEzqhr6iM2zv0J9pQM8gS9IRam0Ek3Hw2myzzZqnnnqqnShRNvjl9AXZ5NGE7IILLmj1gVUtYwf8sZHoUUqYTPD67hozphgDa6yxRqt76O8vv/zSjl/GBJ/0TVAsULiNQH0plw9jjMOajDHOftCmc889t8VwGif+awkT/DgHQx1rzsO0nTmFfwYnTAgynmzVzL/k9uHOJbN0Gcih9i8pF+WNF9W1rAd0lL12xoTSoHDkqyW+OCMi3yh7mHFGCTaTfO6+++4FbGv1h4GFgcHQYFTTt8vICIEJp07TeLAnb859AvGhH8RjvGPQyWIGsLwCxDP4dT8w6orySiY4Uyf6Kw2fffZZa3RIkxJm7N8999yz4YK4AuTBtQs+uTeOsIpmJo5cjBVGK57CRa8xQsTzlJ5LfumpbQkMSk2oJUzJoh5snVAv8s5LoC9ZtWlsc6VEXiXaGSc9tBnCYqIj71cNYdK/kEeqp8hjUoQOQqR4zhQgS34jTzpelE/u45QwJeP1119v97RzOi47S3/iSeGNVwoQH+MNHHJXbOKEER3G28eYUuA7XiVkd7m2lX6U5yiEqUnkUurroISpQRhJRfskgJ0aKmZFzOwjqclQlFakNZ2BwaODcXXEgGKgjNQF5aH8mqCOQunSNpTyx1Vp7cENyWLlBA6jEiV5cJ0yIPXhkn6tkVb5kUByF79jfI4EFA9e6f1brbpLe3Pq/ygXYkUWs2yILxdkMHLGhPTREKZ9KMKlDNob8SJOWwRRr2Md1KbSKkbxo7wMAKOAntZOtEhPObUh6mffFSvwiAa0toylSJcSJgRJn2rMp4SB5wCdU4h6kiMusGDVj7xcfDx5LHLBDasXZ0BmUb9ULk/1eU4u8RpXOR0XieNJ4TZADNg5TRpKh7tUdumAlOJrtwjQF8bwYYcd1r7EhglDLqCzyK4JGuNxQVaTb8g0gxIms7S4f0lFMf5yNcSBiSLjL4+rjGicajsmBwadgEshrhRIhxsNsFH2nNLlZFFPKcsoJK49KMoaxVDm6rDYv2lgUvfcfsEo8ZGc4gq/1L8M6IhZnOR0DRQNplK/RkMY6yRsWSlSv3QFEssvDWwRYmkSpvhar0Y0urm6qs7xSd0opzZo6yM3qYkytPrIeQtiuln5jm6CM08FXJpMZtCrOAHB3rCaip6DqCc54or2LBKtyuLJxIqypONxvEQbGPPwXXYmVy7xklPS8dLEJrappCMqm8lhbnKk+FLd0rYwWUCW3q5WKpffkV0TNMZr61Ajc9Q0gxJmun9JZSBRXKMoUFQw0jLbYkAqaBCTtjSbV9rSk9Uq74sF1NwKTa/sK7mIU7lxJj5KR6EwtIOPZpqp7Fn9WwOTuucM9rjxsX9x9WLY0o/2UEUuMU/XwNJgKhmTaDRybVJfYHRonwYyE0Dt8ZTKJy1Yqc6SpWdfvNLpGUm6q65Kz5O6UU5tYOUPWfYRIXXhLACvniutjHJl4gZM+3bUv1md1HqBVAf6jnJ4KkAAcinGO5nYIFz90QZFPcmN90i+HArLtUnvhVV+eahK40n1pA9Jo3z6XU+Nu5KOKx0TLjxL2J3TTjutraP28Es60ld2X7zK5imbyUqctoN5qR+pD7JrgsZ4CZ8aGZOmGZQwmT2ljYmDnz0TzYJ05zI2YNL9S2TTSSVFRrmlzH3Kq3pJSUlfS+JMEnQIBYP79ddfS9yyeMY25wz2uPExH0YY41T6gCGrvpina2BpMJWMSTSEuTZhVK+++up2zxbjwqEkDu6gk3yn/0vl9xFiX3yqFHHM5Oqapudv6lYyhrn0jFXaRNtYWQ8dwPvZZ59d2BrQFsEoTwit66R1rs7oS0qYpFMfaC9arlpcsjFEPUltGemkZ2DHlbWS/vI7xEE5Kps8Xf1JH5ImVy5layyUdBxSQgb6yyTvmGOOaXgHLmcOdJK3pCN9ZffFRwy1DRdX8zE+fqc+yK4Jwr6ET42MSdMMRpiQle5fppWKYGOYbrvttlbZUveX9q9GcX3GsjgoQt6Sv5y0UXlL/vwoU7NDZuNdyh7zaM+ttMEf087idw3M0gAfN55+2XDDDTvJJ8UjltU1sDSYSsYkGsK0H9n/0YutIZC4TxrJq1S+dApDTTlp6ItP0zMudIQ+rWuaVn9TN8qpCVE+5aTjsEbGrKZBX3KEmd7J5G4rLkNcrDFEPckZZq3MGRu1eKv/S+NJ5Uc7GVe9itdYyOk4cRwqooyTTz554RATeWObSnXuK7svXnXkiV3FBZ67vhXT8Z36ILsmaIzn+qUm/xBpBiNMvT82N8Dl00eROYHJy9VZQcRQs78V06ffIWJOu3X9F0jkEQHWKHw0LLUkzik5Ns5zB0jSOpf+joOSek7y4f/+HHUFoYFJubn+HDderhrkdu1HRlx0gZ88XcZdgylnTJAXjUZsExM9DicgH1d+PIBGvpQw0QnaEe8HyyAORZiUKwNVMnARI6WvTctY0UGm0hF9cOHQHNspjF/+Xg4B3cwRpib09DPeH1bYunsZ2xX1JGeY4xZB7ZUf9A17QNm5MwEqX32eK5c0GnepjtOfcjnnzm7ENqEjrNrR4dinfWX3xVM/VrNsg+27774tYbLlxt/xxLvaqif1QXZN0BjvsgM1ciZJMxhh5vYvVTGRFCsujFIOwKiIta5PyefJqo4rASkRxzR8F+gob9xTTdPxdzTwJSWO+WgDBobBwVWDqJAxXd93FJp2oOiTfnIz1b7yNTDBKJKL8k0Sr73d0hF3ygBHLoCDARiK0Lrch+rX1JioztFoxDZRhvbYcwMxJUz9HWUw6MFqSMLUnbPasYDRqSVMHVxBT9NTzODFpIDrSLjyOBDDfn8unbCdpWeJMKmjrkOx+mFrJncvM+pJbsxHfYSkIKtcYHV11llntQcN49WWrv7sIyWNu1THo+cmNwGKbUJH+JurN8hT6Cu7Lx453Ptk+4l9Yf63JsYxZcXJpcrTcxTC1Djrs9uSPY3nIITJAGNlVzphJWOGUSkRiU5H1i7lIxha1aX7ETGNvsfVG0rQFZgE6N2zXZ3EIOKQA4rM/gEXf8cly676LFacBuY0CJN9Fu3v4ppHd2KQLjF7F4YYBFbt6EbOyJGOmSz1TY2JZEejEckuToq0x648PKMrD31ZLMLUfn6OxGP99J261RKmDq6UDvxg4MGf6xCMKe37qaxZfjJmuX6Ue9UfkyPpHk/+TkPUkxxhkj7qI/fI04A+skcf+4PVOhOUUrm6FoIOl8rVuEx1vG+xwZhh7CCbOk2LMMFB4yk3llKc+Jv69Nlh5dOWXc3eqPIM/RybMDFsKA5Eh3Joo5kZFJv9AKegjmYmj2Io4P7iEAAzP/3/lxAUIPI7s9vUoCovckgDiHpjCXc6KTud9SEDWZTDakXuEerDm2wwoMojucim01EyPlyB4bf4QR4TAF04xh0SLyurrsvlycCjfRx+0QDjv3viN+Jwj6bxKHsaz+u2lF/x8S036IMucUOaWgWzssatg6cg6gn998gjj7QHGegz3t6jQL/x2jCdZKWvVCb1VZ35T3ox/Gk8cmTMkIEBlM5B7ugUpzXJxwoXsuGVgRApJxHRAb0lCEPGm36ee+651g1Om2M85VMPxasNuadWDV2r6piPNkcDHePS7/L4lGTjwqe/9Wq5kgFP5S7V39QXTJmobrTRRm1fsdqnL5gg0w4C/ao7mfHkOvGkQ4+RIV1CBnrH71F/kcWEBrtDnzNZ1uQOWchOL/+j46w40SMmQbI3yMJmsHXCSwOIl56o7um4ZGwxRqVHlK2XZ2CL4jUZxsoBBxzQukmRzf4mNhJbxWqQMrDhXH0jXm3md+qIjsd4PIR481S3Ftjwj/Q2t9INyRa+1hKmrgSOeqd9oaCBvoxNmNxxxLABcvqhQ+OGL0aLV0IBvAKKxSmuNG/8W/eYlCc+eYVZTKvvuVlzdLspXXzG+krxYnzpO8TL6TPyQAIytLGey+k7yltqK3HRU5Cm64vHoMcAGWmihSwGOoaKiVMkS+UBW3QKPQJ3VhG4dXEXXnvttS3RpXWivl11Jo6AwYE0kSXZGAY+ECgGWe8kxUgyycKYQCRpmfytFQBt7opX23JPyoTQcvqcSz8KYbJ6pF7k6dJZDC/Gu8Zzk6vTYv2mCXkO69RNTpvoV/IoxFVlToawUno9ITr0FZ3hw5UobAl6Tf+lAZvHOEGH0HX0Fz3mfARnO/T/XaoOqntpXErPKAdC5s4jclm80EZkQ4SQHmNKB9uYrDKWutpN2Rx+K+m46pa2ESIFC42tND79m7altiFNw9/YC/Ba6jvtYxNmrlH+zQiMigD7GwxMXGl4JboMOLKJJx2GDyOgmT2/8TdGoEZOrp7IYnKHbOTEKw3aV+7aj8nJnOQ37ffGiWZJXi1hQvQ68NPn2qJ8JjKsph3KCEBWYMQqlH28voCeoavoWdRVPG7oHR95XfpkpfHoZ0426SiXxYPGTJp3iL8hQE7Cs9KsCbWEKbdy19ZYTXmTpjFhToqg8xuBKSGg/dOa/aAuwsSIylDqwE/fypX07LexhYGBxVUZvUZTarLFLmMEmMzibh5lFVhDmMjFlY63g0nGUgYT5lKi77KNQAcCEBXu05pVXokw4yErtjEgPdyGIsJS8XJzsrplxcK2w7irnlIZ/n2+ENA2Qs0ETy2vIUydMu7TWcmc5tOEOU10LdsITIgA+2TsOcVTwzmRJcKMe7i4sziUxz5X7mpXlIvrlruyHGjirEGtiy3K8PeVCwEOJ+G5qD3wAzp9hMnqkhsYTBpnQQdNmCuXTru1yxABTmFyIIl93lIoEaZWlBxO4WQnhzVYLdbsY5FGrtxSuf7dCAgBDvwwGRvFdd9HmOwLc5iKdJDnUgcT5lL3gMs3Aj0IQFy4U9lTxD2aCyXCJC8ngJmh8+F7DVnmyvBvRiBFgANKeCC4wsPKcpT9S2R1Eaa8HNyxnhWdNWGmGuC/jcAMIiDSTO/3qaoQZtd/HaV0fhqBIRGQyx+vBW/4if9dY005ECbbDWlA35HJh+s4sxJMmLPSE66HEehBAJcUBytys20udud+7xHpaCMwEQK8GIG70bwIgT3yUXWQE9y5q1pc4+LKTbzaNVFFB8pswhwISIsxAkbACBiB+UbAhDnf/evWGQEjYASMwEAImDAHAtJijIARMAJGYL4RMGHOd/+6dUbACBgBIzAQAibMgYC0GCNgBIyAEZhvBEyY892/bp0RMAJGwAgMhIAJcyAgLcYIGAEjYATmGwET5nz3r1tnBIyAETACAyFgwhwISIsxAkbACBiB+UbAhDnf/evWGQEjYASMwEAImDAHAtJijIARMAJGYL4RMGHOd/+6dUbACBgBIzAQAibMgYC0GCNgBIyAEZhvBEyY892/bp0RMAJGwAgMhIAJcyAgLcYIGAEjYATmGwET5nz3r1tnBIyAETACAyFgwhwISIsxAkbACBiB+UbAhDnf/evWGQEjYASMwEAImDAHAtJijIARMAJGYL4RMGHOd/+6dUbACBgBIzAQAibMgYC0GCNgBIyAEZhvBEyY892/bp0RMAJGwAgMhIAJcyAgLcYIGAEjYATmGwET5nz3r1tnBIyAETACAyFgwhwISIsxAkbACBiB+UbAhDnf/evWGQEjYASMwEAI/A+nGemuVW2s7wAAAABJRU5ErkJggg=="
    }
   },
   "cell_type": "markdown",
   "id": "824c29cf",
   "metadata": {},
   "source": [
    "### MAD(Median Absolute Deviation 绝对中位数法)\n",
    "\n",
    "![image.png](attachment:image.png)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b3ed8364",
   "metadata": {},
   "outputs": [],
   "source": [
    "# MAD\n",
    "def filter_extreme_by_MAD(series, n=3):\n",
    "    # 计算中位数 𝑥_𝑚𝑒𝑑𝑖𝑎𝑛\n",
    "    median = series.median()\n",
    "    # 计算绝对偏差值的中位数 MAD\n",
    "    median_new = abs(series - median).median()\n",
    "    # 计算上下限的值\n",
    "    max_value = median + n * median_new\n",
    "    min_value = median - n * median_new\n",
    "    return np.clip(series, min_value, max_value)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "03368b1a",
   "metadata": {},
   "source": [
    "### 百分位法"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e358a131",
   "metadata": {},
   "outputs": [],
   "source": [
    "# 百分位法\n",
    "def filter_extreme_by_percentile(series, low=0.025, high=0.975):\n",
    "    # 将数据进行排序\n",
    "    series = series.sort_values()\n",
    "    # 计算上下百分比的分位数\n",
    "    quantiles = series.quantile([low, high])\n",
    "    return np.clip(series, quantiles.iloc[0], quantiles.iloc[1])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4f9b6de2",
   "metadata": {},
   "source": [
    "### 标准化——变成可相互比较的分数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d253defc",
   "metadata": {},
   "outputs": [],
   "source": [
    "# z-score\n",
    "def standard_normalize(series):\n",
    "    # 计算均值和方差，z_score标准化\n",
    "    mean = series.mean()\n",
    "    std = series.std()\n",
    "    return (series - mean) / std\n",
    "\n",
    "\n",
    "# max-min\n",
    "def max_min_normalize(series):\n",
    "    # 计算最大值和最小值\n",
    "    max_value = series.max()\n",
    "    min_value = series.min()\n",
    "    # 无量纲标准化序列\n",
    "    return (series - min_value) / (max_value - min_value)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b24828a5",
   "metadata": {},
   "source": [
    "### 中性化——去除行业市值影响"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6fc85a05",
   "metadata": {},
   "outputs": [],
   "source": [
    "def industry_and_size_neutralization(factor_df, factor_name):\n",
    "    result = sm.OLS(factor_df[factor_name], factor_df[list(factor_df.ind_code.unique()) + ['size']], hasconst=False).fit()\n",
    "    return result.resid"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9da12ab1",
   "metadata": {},
   "source": [
    "#### 参考资料"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "24aebb08",
   "metadata": {},
   "source": [
    "《光大证券多因子系列报告之一：因子测试框架》<br>\n",
    "《东证期货_商品因子系列（一）：商品多因子模型框架再探究》<br>\n",
    "《股票多因子实战》<br>\n",
    "《因子投资：方法与实践》"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ae8b620b",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5c639e17",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "807b4b71",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
